Skip to content

feat(serve): improve OpenAI API compatibility with usage, finish_reas…#771

Merged
ajbozarth merged 6 commits into
generative-computing:mainfrom
markstur:serve
Apr 2, 2026
Merged

feat(serve): improve OpenAI API compatibility with usage, finish_reas…#771
ajbozarth merged 6 commits into
generative-computing:mainfrom
markstur:serve

Conversation

@markstur
Copy link
Copy Markdown
Contributor

@markstur markstur commented Apr 1, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

feat(serve): improve OpenAI API compatibility with usage, finish_reason, and system_fingerprint

Add missing OpenAI API fields to m serve chat completion responses:

  • Add finish_reason field to Choice model (defaults to "stop")
  • Add usage field with token counts (prompt_tokens, completion_tokens, total_tokens)
    extracted from ModelOutputThunk.usage when available
  • Add system_fingerprint field populated from model or provider metadata
  • Fix bug in model_options extraction (use model_dump().items() instead of iterating request)

Includes comprehensive test coverage with 13 unit tests verifying all new fields
and edge cases (missing data, partial data, fallback behavior).

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

…on, and system_fingerprint

Add missing OpenAI API fields to m serve chat completion responses:

- Add `finish_reason` field to Choice model (defaults to "stop")
- Add `usage` field with token counts (prompt_tokens, completion_tokens, total_tokens)
  extracted from ModelOutputThunk.usage when available
- Add `system_fingerprint` field populated from model or provider metadata
- Fix bug in model_options extraction (use model_dump().items() instead of iterating request)

Includes comprehensive test coverage with 13 unit tests verifying all new fields
and edge cases (missing data, partial data, fallback behavior).

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur markstur requested a review from a team as a code owner April 1, 2026 04:04
@github-actions github-actions Bot added the enhancement New feature or request label Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

Comment thread test/cli/test_serve.py Outdated
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@ajbozarth
Copy link
Copy Markdown
Contributor

I used Claude to do a PR review, here are a few highlight you may want to look into:

Issues

  1. system_fingerprint semantics (design concern)
    The PR populates system_fingerprint with output.model (e.g., "gpt-4"). In OpenAI's spec, system_fingerprint is a hash/fingerprint of backend config, not the model name. The model name is already in response.model. This
    conflates two different fields.

    Options: use None always (cleanest), derive a real fingerprint hash, or rename/repurpose the field with a comment noting the divergence from spec.

  2. Partial usage data: total_tokens silently wrong
    In test_usage_with_partial_data, when only prompt_tokens: 10 is provided:

    assert response.usage.total_tokens == 0  # Should default to 0                                                                                                                                                                         

    The comment says "should default to 0" but 0 is an incorrect total. It should be prompt_tokens + completion_tokens at minimum. This is a silent bad value going out over the API.

  3. Unused patch import in test file
    from unittest.mock import Mock, patch # patch never used
    Ruff will catch this but worth fixing before it fails the pre-commit hook.


Minor / Nits

  • test_completion_id_format asserts len(response.id) == 38 — brittle if ID generation changes. len(response.id) > len("chatcmpl-") or a regex would be more robust.
  • patch unused import will fail ruff check.

markstur added 4 commits April 1, 2026 13:39
system_fingerprint should be a backend config hash, not the model name.
The model name is already in response.model. Set to None since we don't
currently track backend config fingerprints.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
When partial usage data is provided (e.g., only prompt_tokens),
total_tokens was incorrectly defaulting to 0. Now it's calculated
as prompt_tokens + completion_tokens when not explicitly provided.

This prevents silent bad values from going out over the OpenAI-compatible API.

- Modified cli/serve/app.py to calculate total_tokens when missing
- Updated test_usage_with_partial_data to expect correct behavior

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Replace hardcoded length assertion with implementation-agnostic check.
The test now validates that the ID has the correct prefix and a non-empty
suffix, without coupling to specific length or format details.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur
Copy link
Copy Markdown
Contributor Author

markstur commented Apr 1, 2026

  • patch unused import will fail ruff check.

I never got that error in precommit. 🤷‍♂️

Updated with fixes for the comments

@markstur markstur requested a review from ajbozarth April 1, 2026 21:43
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude and I agree this LGTM, might want one more approval before merge for sanity

Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; thank you

@ajbozarth ajbozarth added this pull request to the merge queue Apr 2, 2026
Merged via the queue into generative-computing:main with commit c887d60 Apr 2, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants